Overview

Dataset statistics

Number of variables15
Number of observations32561
Missing cells0
Missing cells (%)0.0%
Duplicate rows24
Duplicate rows (%)0.1%
Total size in memory3.7 MiB
Average record size in memory120.0 B

Variable types

NUM13
BOOL2

Warnings

Dataset has 24 (0.1%) duplicate rows Duplicates
workclass has 1836 (5.6%) zeros Zeros
education has 933 (2.9%) zeros Zeros
marital-status has 4443 (13.6%) zeros Zeros
occupation has 1843 (5.7%) zeros Zeros
relationship has 13193 (40.5%) zeros Zeros
capital-gain has 29849 (91.7%) zeros Zeros
capital-loss has 31042 (95.3%) zeros Zeros
native-country has 583 (1.8%) zeros Zeros

Reproduction

Analysis started2021-02-01 12:54:29.644432
Analysis finished2021-02-01 12:56:04.320763
Duration1 minute and 34.68 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

age
Real number (ℝ≥0)

Distinct73
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.58164676
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-02-01T13:56:04.617001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.64043255
Coefficient of variation (CV)0.3535471837
Kurtosis-0.1661274596
Mean38.58164676
Median Absolute Deviation (MAD)10
Skewness0.5587433694
Sum1256257
Variance186.0614002
MonotocityNot monotonic
2021-02-01T13:56:05.012955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
368982.8%
 
318882.7%
 
348862.7%
 
238772.7%
 
358762.7%
 
338752.7%
 
288672.7%
 
308612.6%
 
378582.6%
 
258412.6%
 
Other values (63)2383473.2%
 
ValueCountFrequency (%) 
173951.2%
 
185501.7%
 
197122.2%
 
207532.3%
 
217202.2%
 
ValueCountFrequency (%) 
90430.1%
 
883< 0.1%
 
871< 0.1%
 
861< 0.1%
 
853< 0.1%
 

workclass
Real number (ℝ≥0)

ZEROS

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.868892233
Minimum0
Maximum8
Zeros1836
Zeros (%)5.6%
Memory size254.4 KiB
2021-02-01T13:56:05.349209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median4
Q34
95-th percentile6
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.455959761
Coefficient of variation (CV)0.3763247134
Kurtosis1.682386955
Mean3.868892233
Median Absolute Deviation (MAD)0
Skewness-0.752024012
Sum125975
Variance2.119818825
MonotocityNot monotonic
2021-02-01T13:56:05.587142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
42269669.7%
 
625417.8%
 
220936.4%
 
018365.6%
 
712984.0%
 
511163.4%
 
19602.9%
 
814< 0.1%
 
37< 0.1%
 
ValueCountFrequency (%) 
018365.6%
 
19602.9%
 
220936.4%
 
37< 0.1%
 
42269669.7%
 
ValueCountFrequency (%) 
814< 0.1%
 
712984.0%
 
625417.8%
 
511163.4%
 
42269669.7%
 

fnlwgt
Real number (ℝ≥0)

Distinct21648
Distinct (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189778.3665
Minimum12285
Maximum1484705
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-02-01T13:56:05.952003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12285
5-th percentile39460
Q1117827
median178356
Q3237051
95-th percentile379682
Maximum1484705
Range1472420
Interquartile range (IQR)119224

Descriptive statistics

Standard deviation105549.9777
Coefficient of variation (CV)0.5561749721
Kurtosis6.218810978
Mean189778.3665
Median Absolute Deviation (MAD)59894
Skewness1.446980095
Sum6179373392
Variance1.114079779e+10
MonotocityNot monotonic
2021-02-01T13:56:06.338147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20348813< 0.1%
 
12301113< 0.1%
 
16419013< 0.1%
 
14899512< 0.1%
 
11336412< 0.1%
 
12112412< 0.1%
 
12667512< 0.1%
 
12656911< 0.1%
 
12398311< 0.1%
 
15565911< 0.1%
 
Other values (21638)3244199.6%
 
ValueCountFrequency (%) 
122851< 0.1%
 
137691< 0.1%
 
148781< 0.1%
 
188271< 0.1%
 
192141< 0.1%
 
ValueCountFrequency (%) 
14847051< 0.1%
 
14554351< 0.1%
 
13661201< 0.1%
 
12683391< 0.1%
 
12265831< 0.1%
 

education
Real number (ℝ≥0)

ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.29820951
Minimum0
Maximum15
Zeros933
Zeros (%)2.9%
Memory size254.4 KiB
2021-02-01T13:56:06.664777image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q19
median11
Q312
95-th percentile15
Maximum15
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.870263951
Coefficient of variation (CV)0.3758191116
Kurtosis0.6806551901
Mean10.29820951
Median Absolute Deviation (MAD)2
Skewness-0.9340424374
Sum335320
Variance14.97894305
MonotocityNot monotonic
2021-02-01T13:56:06.957374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
111050132.3%
 
15729122.4%
 
9535516.4%
 
1217235.3%
 
813824.2%
 
111753.6%
 
710673.3%
 
09332.9%
 
56462.0%
 
145761.8%
 
Other values (6)19125.9%
 
ValueCountFrequency (%) 
09332.9%
 
111753.6%
 
24331.3%
 
31680.5%
 
43331.0%
 
ValueCountFrequency (%) 
15729122.4%
 
145761.8%
 
13510.2%
 
1217235.3%
 
111050132.3%
 

education-num
Real number (ℝ≥0)

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.08067934
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-02-01T13:56:07.262238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.572720332
Coefficient of variation (CV)0.2552129916
Kurtosis0.6234440748
Mean10.08067934
Median Absolute Deviation (MAD)1
Skewness-0.3116758679
Sum328237
Variance6.618889907
MonotocityNot monotonic
2021-02-01T13:56:07.561759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
91050132.3%
 
10729122.4%
 
13535516.4%
 
1417235.3%
 
1113824.2%
 
711753.6%
 
1210673.3%
 
69332.9%
 
46462.0%
 
155761.8%
 
Other values (6)19125.9%
 
ValueCountFrequency (%) 
1510.2%
 
21680.5%
 
33331.0%
 
46462.0%
 
55141.6%
 
ValueCountFrequency (%) 
164131.3%
 
155761.8%
 
1417235.3%
 
13535516.4%
 
1210673.3%
 

marital-status
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.611836246
Minimum0
Maximum6
Zeros4443
Zeros (%)13.6%
Memory size254.4 KiB
2021-02-01T13:56:08.610255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median2
Q34
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.506221723
Coefficient of variation (CV)0.5766907192
Kurtosis-0.5360804148
Mean2.611836246
Median Absolute Deviation (MAD)2
Skewness-0.01350813803
Sum85044
Variance2.268703879
MonotocityNot monotonic
2021-02-01T13:56:08.866147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
21497646.0%
 
41068332.8%
 
0444313.6%
 
510253.1%
 
69933.0%
 
34181.3%
 
1230.1%
 
ValueCountFrequency (%) 
0444313.6%
 
1230.1%
 
21497646.0%
 
34181.3%
 
41068332.8%
 
ValueCountFrequency (%) 
69933.0%
 
510253.1%
 
41068332.8%
 
34181.3%
 
21497646.0%
 

occupation
Real number (ℝ≥0)

ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.572740395
Minimum0
Maximum14
Zeros1843
Zeros (%)5.7%
Memory size254.4 KiB
2021-02-01T13:56:09.255312image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median7
Q310
95-th percentile13
Maximum14
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.228856803
Coefficient of variation (CV)0.6433932499
Kurtosis-1.234720733
Mean6.572740395
Median Absolute Deviation (MAD)4
Skewness0.1145833164
Sum214015
Variance17.88322986
MonotocityNot monotonic
2021-02-01T13:56:09.550800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%) 
10414012.7%
 
3409912.6%
 
4406612.5%
 
1377011.6%
 
12365011.2%
 
8329510.1%
 
720026.1%
 
018435.7%
 
1415974.9%
 
613704.2%
 
Other values (5)27298.4%
 
ValueCountFrequency (%) 
018435.7%
 
1377011.6%
 
29< 0.1%
 
3409912.6%
 
4406612.5%
 
ValueCountFrequency (%) 
1415974.9%
 
139282.9%
 
12365011.2%
 
116492.0%
 
10414012.7%
 

relationship
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.446362212
Minimum0
Maximum5
Zeros13193
Zeros (%)40.5%
Memory size254.4 KiB
2021-02-01T13:56:09.844241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.60677095
Coefficient of variation (CV)1.110904956
Kurtosis-0.7683583398
Mean1.446362212
Median Absolute Deviation (MAD)1
Skewness0.7868177781
Sum47095
Variance2.581712887
MonotocityNot monotonic
2021-02-01T13:56:10.153886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
01319340.5%
 
1830525.5%
 
3506815.6%
 
4344610.6%
 
515684.8%
 
29813.0%
 
ValueCountFrequency (%) 
01319340.5%
 
1830525.5%
 
29813.0%
 
3506815.6%
 
4344610.6%
 
ValueCountFrequency (%) 
515684.8%
 
4344610.6%
 
3506815.6%
 
29813.0%
 
1830525.5%
 

race
Real number (ℝ≥0)

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.665857928
Minimum0
Maximum4
Zeros311
Zeros (%)1.0%
Memory size254.4 KiB
2021-02-01T13:56:10.404137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q14
median4
Q34
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8488056043
Coefficient of variation (CV)0.2315435079
Kurtosis4.876310395
Mean3.665857928
Median Absolute Deviation (MAD)0
Skewness-2.435386267
Sum119364
Variance0.7204709539
MonotocityNot monotonic
2021-02-01T13:56:10.673101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
42781685.4%
 
231249.6%
 
110393.2%
 
03111.0%
 
32710.8%
 
ValueCountFrequency (%) 
03111.0%
 
110393.2%
 
231249.6%
 
32710.8%
 
42781685.4%
 
ValueCountFrequency (%) 
42781685.4%
 
32710.8%
 
231249.6%
 
110393.2%
 
03111.0%
 

sex
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.4 KiB
1
21790 
0
10771 
ValueCountFrequency (%) 
12179066.9%
 
01077133.1%
 
2021-02-01T13:56:10.902971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

capital-gain
Real number (ℝ≥0)

ZEROS

Distinct119
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1077.648844
Minimum0
Maximum99999
Zeros29849
Zeros (%)91.7%
Memory size254.4 KiB
2021-02-01T13:56:11.136592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7385.292085
Coefficient of variation (CV)6.853152702
Kurtosis154.7994379
Mean1077.648844
Median Absolute Deviation (MAD)0
Skewness11.95384769
Sum35089324
Variance54542539.18
MonotocityNot monotonic
2021-02-01T13:56:11.525997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02984991.7%
 
150243471.1%
 
76882840.9%
 
72982460.8%
 
999991590.5%
 
3103970.3%
 
5178970.3%
 
4386700.2%
 
5013690.2%
 
8614550.2%
 
Other values (109)12884.0%
 
ValueCountFrequency (%) 
02984991.7%
 
1146< 0.1%
 
4012< 0.1%
 
594340.1%
 
9148< 0.1%
 
ValueCountFrequency (%) 
999991590.5%
 
413102< 0.1%
 
340955< 0.1%
 
27828340.1%
 
2523611< 0.1%
 

capital-loss
Real number (ℝ≥0)

ZEROS

Distinct92
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.30382973
Minimum0
Maximum4356
Zeros31042
Zeros (%)95.3%
Memory size254.4 KiB
2021-02-01T13:56:11.925929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range (IQR)0

Descriptive statistics

Standard deviation402.9602186
Coefficient of variation (CV)4.615607584
Kurtosis20.37680171
Mean87.30382973
Median Absolute Deviation (MAD)0
Skewness4.594629122
Sum2842700
Variance162376.9378
MonotocityNot monotonic
2021-02-01T13:56:12.281405image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03104295.3%
 
19022020.6%
 
19771680.5%
 
18871590.5%
 
1485510.2%
 
1848510.2%
 
2415490.2%
 
1602470.1%
 
1740420.1%
 
1590400.1%
 
Other values (82)7102.2%
 
ValueCountFrequency (%) 
03104295.3%
 
1551< 0.1%
 
2134< 0.1%
 
3233< 0.1%
 
4193< 0.1%
 
ValueCountFrequency (%) 
43563< 0.1%
 
39002< 0.1%
 
37702< 0.1%
 
36832< 0.1%
 
30042< 0.1%
 

hours-per-week
Real number (ℝ≥0)

Distinct94
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.43745585
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size254.4 KiB
2021-02-01T13:56:12.649205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.34742868
Coefficient of variation (CV)0.3053463286
Kurtosis2.916686796
Mean40.43745585
Median Absolute Deviation (MAD)3
Skewness0.2276425368
Sum1316684
Variance152.4589951
MonotocityNot monotonic
2021-02-01T13:56:13.036701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
401521746.7%
 
5028198.7%
 
4518245.6%
 
6014754.5%
 
3512974.0%
 
2012243.8%
 
3011493.5%
 
556942.1%
 
256742.1%
 
485171.6%
 
Other values (84)567117.4%
 
ValueCountFrequency (%) 
1200.1%
 
2320.1%
 
3390.1%
 
4540.2%
 
5600.2%
 
ValueCountFrequency (%) 
99850.3%
 
9811< 0.1%
 
972< 0.1%
 
965< 0.1%
 
952< 0.1%
 

native-country
Real number (ℝ≥0)

ZEROS

Distinct42
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.71886613
Minimum0
Maximum41
Zeros583
Zeros (%)1.8%
Memory size254.4 KiB
2021-02-01T13:56:13.643363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19
Q139
median39
Q339
95-th percentile39
Maximum41
Range41
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7.823781904
Coefficient of variation (CV)0.2130725354
Kurtosis12.53305268
Mean36.71886613
Median Absolute Deviation (MAD)0
Skewness-3.658303295
Sum1195603
Variance61.21156328
MonotocityNot monotonic
2021-02-01T13:56:14.163080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%) 
392917089.6%
 
266432.0%
 
05831.8%
 
301980.6%
 
111370.4%
 
21210.4%
 
331140.4%
 
81060.3%
 
191000.3%
 
5950.3%
 
Other values (32)12944.0%
 
ValueCountFrequency (%) 
05831.8%
 
1190.1%
 
21210.4%
 
3750.2%
 
4590.2%
 
ValueCountFrequency (%) 
4116< 0.1%
 
40670.2%
 
392917089.6%
 
38190.1%
 
37180.1%
 

income
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.4 KiB
0
24720 
1
7841 
ValueCountFrequency (%) 
02472075.9%
 
1784124.1%
 
2021-02-01T13:56:14.387119image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-02-01T13:54:52.122861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:52.966202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:53.844665image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:54.672261image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:55.478587image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:56.270018image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:56.854642image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:57.419580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:57.945768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:58.477178image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:58.988513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:59.484337image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:54:59.907860image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:00.411880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:02.534689image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:03.001652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:03.533598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:04.097522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:04.533874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:05.064160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:05.517027image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:05.965876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:06.524832image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:07.015439image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:07.456502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:08.191565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:08.823198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:09.366338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:10.051267image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:10.592326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:10.941525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:11.307527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:12.445646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:13.560157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:13.980513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:14.571838image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:14.896784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:15.226446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:15.765843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:16.523684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:16.894609image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:17.236336image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:17.575261image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:18.213313image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:18.589165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:18.920788image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:19.612967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:19.931390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:20.469417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:20.784252image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:21.317226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:21.625099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:21.930451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:22.244266image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:22.570530image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:22.899554image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:23.242326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:23.756044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:24.732778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:25.053117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:25.788586image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:26.165018image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:26.478675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:26.782398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:27.085087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:27.397645image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:27.722779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:28.024042image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:28.374496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:28.710636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:29.233839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:29.560645image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:29.968557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:30.301870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:30.728695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:31.081315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:31.407737image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:31.715202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:32.017051image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:32.356961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:32.722928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:33.135115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:33.556413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:33.957170image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:34.360397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:34.713429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:35.035765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:35.373948image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:35.698347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:36.042222image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:36.404094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:36.721098image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:37.660503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:37.974492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:38.299071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:38.608599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:39.040185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:39.414784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:39.769778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:40.096308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:40.420394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:40.734412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:41.025175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:41.334549image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:41.619732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:41.954011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:42.303100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:42.633175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:42.970429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:43.322048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:43.668957image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:44.063818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:44.422334image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:44.775232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:45.105843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:45.438182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:45.811985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:46.145691image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:46.447698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:46.790938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:47.107523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:47.472950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:47.806314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:48.127005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:48.418220image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:48.731796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:49.119939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:49.426065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:49.754512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:50.060517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:50.359691image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:50.670311image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:50.983205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:51.313929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:51.622478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:51.935869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:52.223682image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:52.523874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:52.821060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:53.133486image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:53.425338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:53.717671image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:54.052379image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:54.393346image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:54.727218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:55.047636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:55.365067image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:55.672104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:55.997345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:56.304630image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:56.666601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:57.068158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:57.389841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:57.695514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:57.997813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:58.308978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:58.596326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:58.939941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:59.310314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:59.604369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:55:59.920031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:00.264469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:00.542192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:00.835926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:01.105911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:01.429231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:01.725916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:02.027735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:02.317260image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-01T13:56:14.603945image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-01T13:56:15.180418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-01T13:56:15.777211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-01T13:56:16.359526image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-02-01T13:56:02.906555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:03.797445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
039777516913411412174040390
150683311913240410013390
2384215646119061410040390
353423472117260210040390
4284338409913210520004050
53742845821214245400040390
649416018765381200016230
7526209642119240410045391
831445781121441014014084050391
9424159449913240415178040391

Last rows

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
325513243406606260010040390
32552434846618112120410045390
3255332411613812144131110011360
325545343218651214240410040391
3255522431015215104111410040390
325562742573027122135400038390
32557404154374119270410040391
32558584151910119614400040390
32559224201490119413410020390
325605252879271192454015024040391

Duplicate rows

Most frequent

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincomecount
8254195994324914000401303
0194972611194514100403902
119413815315104134000103902
219414667915104432100303902
319425157915104834100143902
4204107658151041314000103902
52142433681314514100502602
6214250051151041034000103902
7234240137434614100552602
92543081449134314100402602